Search CORE

130 research outputs found

Quantum random walks without walking

Author: J. B. Wang
K. Manouchehri
Z. Bar-Yossef
Publication venue: 'American Physical Society (APS)'
Publication date: 17/12/2009
Field of study

Quantum random walks have received much interest due to their non-intuitive dynamics, which may hold the key to a new generation of quantum algorithms. What remains a major challenge is a physical realization that is experimentally viable and not limited to special connectivity criteria. We present a scheme for walking on arbitrarily complex graphs, which can be realized using a variety of quantum systems such as a BEC trapped inside an optical lattice. This scheme is particularly elegant since the walker is not required to physically step between the nodes; only flipping coins is sufficient.Comment: 12 manuscript pages, 3 figure

arXiv.org e-Print Archive

Crossref

Lower bounds in differential privacy

Author: C. Dwork
C. Dwork
P. Erdős
Z. Bar-Yossef
Publication venue
Publication date: 21/12/2011
Field of study

This is a paper about private data analysis, in which a trusted curator holding a confidential database responds to real vector-valued queries. A common approach to ensuring privacy for the database elements is to add appropriately generated random noise to the answers, releasing only these {\em noisy} responses. In this paper, we investigate various lower bounds on the noise required to maintain different kind of privacy guarantees.Comment: Corrected some minor errors and typos. To appear in Theory of Cryptography Conference (TCC) 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Sampling Triples from Restricted Networks Using MCMC Strategy

Author: Bar-Yossef Z.
Geweke J.
Rahman M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In large networks, the connected triples are useful for solving various tasks including link prediction, community detection, and spam filtering. Existing works in this direction concern mostly with the exact or approximate counting of connected triples that are closed (aka, triangles). Evidently, the task of triple sampling has not been explored in depth, although sampling is a more fundamental task than counting, and the former is useful for solving various other tasks, including counting. In recent years, some works on triple sampling have been proposed that are based on direct sampling, solely for the purpose of triangle count approximation. They sample only from a uniform distribution, and are not effective for sampling triples from an arbitrary user-defined distribution. In this work we present two indirect triple sampling methods that are based on Markov Chain Monte Carlo (MCMC) sampling strategy. Both of the above methods are highly efficient compared to a direct sampling-based method, specifically for the task of sampling from a non-uniform probability distribution. Another significant advantage of the proposed methods is that they can sample triples from networks that have restricted access, on which a direct sampling based method is simply not applicable

Crossref

IUPUIScholarWorks

Selectivity estimation on streaming spatio-textual data using local correlations

Author: Bar-Yossef Z.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Quantified Derandomization of Linear Threshold Circuits

Author: Bar-Yossef Z.
Bounded
Cheng Kuan
Impagliazzo R.
P
Pseudorandomness
Tamaki Suguru
The
Williams Ryan
Publication venue
Publication date: 06/11/2017
Field of study

One of the prominent current challenges in complexity theory is the attempt to prove lower bounds for

TC^0

, the class of constant-depth, polynomial-size circuits with majority gates. Relying on the results of Williams (2013), an appealing approach to prove such lower bounds is to construct a non-trivial derandomization algorithm for

TC^0

. In this work we take a first step towards the latter goal, by proving the first positive results regarding the derandomization of

TC^0

circuits of depth

d>2

. Our first main result is a quantified derandomization algorithm for

TC^0

circuits with a super-linear number of wires. Specifically, we construct an algorithm that gets as input a

TC^0

circuit

C

over

n

input bits with depth

d

and

n^{1+\exp(-d)}

wires, runs in almost-polynomial-time, and distinguishes between the case that

C

rejects at most

2^{n^{1-1/5d}}

inputs and the case that

C

accepts at most

2^{n^{1-1/5d}}

inputs. In fact, our algorithm works even when the circuit

C

is a linear threshold circuit, rather than just a

TC^0

circuit (i.e.,

C

is a circuit with linear threshold gates, which are stronger than majority gates). Our second main result is that even a modest improvement of our quantified derandomization algorithm would yield a non-trivial algorithm for standard derandomization of all of

TC^0

, and would consequently imply that

NEXP\not\subseteq TC^0

. Specifically, if there exists a quantified derandomization algorithm that gets as input a

TC^0

circuit with depth

d

and

n^{1+O(1/d)}

wires (rather than

n^{1+\exp(-d)}

wires), runs in time at most

2^{n^{\exp(-d)}}

, and distinguishes between the case that

C

rejects at most

2^{n^{1-1/5d}}

inputs and the case that

C

accepts at most

2^{n^{1-1/5d}}

inputs, then there exists an algorithm with running time

2^{n^{1-\Omega(1)}}

for standard derandomization of

TC^0

.Comment: Changes in this revision: An additional result (a PRG for quantified derandomization of depth-2 LTF circuits); rewrite of some of the exposition; minor correction

arXiv.org e-Print Archive

Crossref

Parallel Repetition of Entangled Games with Exponential Decay via the Superposed Information Cost

Author: J. Håstad
J.F. Clauser
M. Bellare
R. Cleve
R. Raz
U. Feige
Z. Bar-Yossef
Publication venue
Publication date: 01/01/2013
Field of study

In a two-player game, two cooperating but non communicating players, Alice and Bob, receive inputs taken from a probability distribution. Each of them produces an output and they win the game if they satisfy some predicate on their inputs/outputs. The entangled value

\omega^*(G)

of a game

G

is the maximum probability that Alice and Bob can win the game if they are allowed to share an entangled state prior to receiving their inputs. The

n

-fold parallel repetition

G^n

G

consists of

n

instances of

G

where the players receive all the inputs at the same time and produce all the outputs at the same time. They win

G^n

if they win each instance of

G

. In this paper we show that for any game

G

such that

\omega^*(G) = 1 - \varepsilon < 1

\omega^*(G^n)

decreases exponentially in

n

. First, for any game

G

on the uniform distribution, we show that

\omega^*(G^n) = (1 - \varepsilon^2)^{\Omega\left(\frac{n}{\log(|I||O|)} - |\log(\varepsilon)|\right)}

, where

|I|

and

|O|

are the sizes of the input and output sets. From this result, we show that for any entangled game

G

\omega^*(G^n) \le (1 - \varepsilon^2)^{\Omega(\frac{n}{Q\log(|I||O|)} - \frac{|\log(\varepsilon)|}{Q})}

where

p

is the input distribution of

G

and

Q= \frac{|I|^2 \max_{xy} p_{xy}^2 }{\min_{xy} p_{xy} }

. This implies parallel repetition with exponential decay as long as

\min_{xy} \{p_{xy}\} \neq 0

for general games. To prove this parallel repetition, we introduce the concept of \emph{Superposed Information Cost} for entangled games which is inspired from the information cost used in communication complexity.Comment: In the first version of this paper we presented a different, stronger Corollary 1 but due to an error in the proof we had to modify it in the second version. This third version is a minor update. We correct some typos and re-introduce a proof accidentally commented out in the second versio

arXiv.org e-Print Archive

Crossref

CWI's Institutional Repository

INRIA a CCSD electronic archive server

FLEET: Butterfly Estimation from a Bipartite Graph Stream

Author: Bar-Yossef R. Kumar Z.
Bera Suman K
Braverman Vladimir
Kane Daniel M
Li Lin
Liu Boge
Mehta Aranyak
Milo Ron
Shin Kijung
Turk Ata
Zhu Rong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/08/2019
Field of study

We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Managemen

arXiv.org e-Print Archive

Crossref

Selectivity estimation on set containment search

Author: K Tzoumas
R Baeza-Yates
R Jampani
S Helmer
S Melnik
S Suri
X Wang
Z Bar-Yossef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© Springer Nature Switzerland AG 2019. In this paper, we study the problem of selectivity estimation on set containment search. Given a query record Q and a record dataset S, we aim to accurately and efficiently estimate the selectivity of set containment search of query Q over S. The problem has many important applications in commercial fields and scientific studies. To the best of our knowledge, this is the first work to study this important problem. We first extend existing distinct value estimating techniques to solve this problem and develop an inverted list and G-KMV sketch based approach IL-GKMV. We analyse that the performance of IL-GKMV degrades with the increase of vocabulary size. Motivated by limitations of existing techniques and the inherent challenges of the problem, we resort to developing effective and efficient sampling approaches and propose an ordered trie structure based sampling approach named OT-Sampling. OT-Sampling partitions records based on element frequency and occurrence patterns and is significantly more accurate compared with simple random sampling method and IL-GKMV. To further enhance performance, a divide-and-conquer based sampling approach, DC-Sampling, is presented with an inclusion/exclusion prefix to explore the pruning opportunities. We theoretically analyse the proposed techniques regarding various accuracy estimators. Our comprehensive experiments on 6 real datasets verify the effectiveness and efficiency of our proposed techniques

Crossref

OPUS - University of Technology Sydney

Sublinear Estimation of Weighted Matchings in Dynamic Data Streams

Author: A. McGregor
A. McGregor
C. Konrad
D. Gavinsky
J. Feigenbaum
K. Ahn
K. Ahn
L. Epstein
M. Crouch
M. Zelke
N. Nisan
R. Uehara
W. Tutte
Z. Bar-Yossef
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents an algorithm for estimating the weight of a maximum weighted matching by augmenting any estimation routine for the size of an unweighted matching. The algorithm is implementable in any streaming model including dynamic graph streams. We also give the first constant estimation for the maximum matching size in a dynamic graph stream for planar graphs (or any graph with bounded arboricity) using

\tilde{O}(n^{4/5})

space which also extends to weighted matching. Using previous results by Kapralov, Khanna, and Sudan (2014) we obtain a

\mathrm{polylog}(n)

approximation for general graphs using

\mathrm{polylog}(n)

space in random order streams, respectively. In addition, we give a space lower bound of

\Omega(n^{1-\varepsilon})

for any randomized algorithm estimating the size of a maximum matching up to a

1+O(\varepsilon)

factor for adversarial streams

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Sparse recovery with partial support knowledge

Author: B.S. Kashin
D.L. Donoho
E.D. Gluskin
E.J. Candès
G. Cormode
M. Charikar
N. Shental
P.B. Milterson
R.G. Baraniuk
Z. Bar-Yossef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

14th International Workshop, APPROX 2011, and 15th International Workshop, RANDOM 2011, Princeton, NJ, USA, August 17-19, 2011. ProceedingsThe goal of sparse recovery is to recover the (approximately) best k-sparse approximation [ˆ over x] of an n-dimensional vector x from linear measurements Ax of x. We consider a variant of the problem which takes into account partial knowledge about the signal. In particular, we focus on the scenario where, after the measurements are taken, we are given a set S of size s that is supposed to contain most of the “large” coefficients of x. The goal is then to find [ˆ over x] such that [ ||x-[ˆ over x]|| [subscript p] ≤ C min ||x-x'||[subscript q]. [over] k-sparse x' [over] supp (x') [c over _] S] We refer to this formulation as the sparse recovery with partial support knowledge problem ( SRPSK ). We show that SRPSK can be solved, up to an approximation factor of C = 1 + ε, using O( (k/ε) log(s/k)) measurements, for p = q = 2. Moreover, this bound is tight as long as s = O(εn / log(n/ε)). This completely resolves the asymptotic measurement complexity of the problem except for a very small range of the parameter s. To the best of our knowledge, this is the first variant of (1 + ε)-approximate sparse recovery for which the asymptotic measurement complexity has been determined.Space and Naval Warfare Systems Center San Diego (U.S.) (Contract N66001-11-C-4092)David & Lucile Packard Foundation (Fellowship)Center for Massive Data Algorithmics (MADALGO)National Science Foundation (U.S.) (Grant CCF-0728645)National Science Foundation (U.S.) (Grant CCF-1065125

DSpace@MIT

Crossref